CHPT.7[LSP,JRA] - www.SailDart.org

perm filename CHPT.7[LSP,JRA] blob sn#316167 filedate 1977-11-07 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00016 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	.SEC(Storage#Structures and#Efficiency,,,P210:)
C00009 00003	.SS(Vectors and Arrays,,P137:)
C00022 00004	An implementation of an array facility in LISP might include
C00027 00005	.SS(Strings and Linear LISP,string processor,P27:)
C00039 00006	.SS(A Compacting Collector for LISP,,P291:)
C00052 00007	.SS(Bit-tables,,P263:)
C00055 00008	.SS(Representations of Complex Data Structures,,P138:)
C00064 00009	.SS(%3rplaca%* and %3rplacd%*,,P171:)
C00075 00010	.SS(Applications of ¬3rplaca¬* and ¬3rplacd¬*,,P155:)
C00088 00011	.<<NUMBERS>>
C00098 00012	.SS(Stacks and Threading)
C00105 00013	.SS(A Non-recursive %3read%*)
C00116 00014	.SS(More Applications of Threading)
C00119 00015	.SS(Storage Management and LISP,,P224:)
C00135 00016	.SS(Hash Techniques,,P248:)
C00141 ENDMK
C⊗;
.SEC(Storage#Structures and#Efficiency,,,P210:)
.SS(Introduction)
.FP
This chapter reconciles some of the generality of LISP with the 
realities of contemporary data structures and machine organization.
Though  any algorithm can be coded in terms of
manipulations of binary trees, often there  are more
efficient organizations of data.
For example, our numerical algorithms could be expressed as list algorithms
using %3(#)%1, %3((#))%1, %3((#)#(#))%1, and so on, as representations
for %30%1, %31%1, %32%1, respectively.
Most machines supply hardware arithmetic representations and operations,
making such list representations unnecessary.


At the next level of data organization are vectors and arrays of numerals.   
These
data structures could  also be stored in a list-structure format
and individual components could be accessed by  %3car-cdr%* chains.  
However, most
machines  have a  hardware  organization which  can  be exploited  to
increase   accessing   efficiency  over   the   list  representation.
Sequential storage for elements, often coupled with hardware index registers
for  fast access to elements, makes a more effective representation.

Similarly, strings can  be represented as  lists of characters.   The
string processing operations are expressible as LISP algorithms;
again, this is usually not the most reasonable representation. 
Some machines supply special hardware aids for string operations.

Even  at
the level of list-structure operations, simple binary trees might not
be the most  expeditious representation.  Also many
of the algorithms we  have presented in  LISP are overly wasteful  of
computation time. 

There are no general rules for selecting data representations and
chosing programming style. Efficiency must be balanced against generality.
The chosen representation
must match the machine and  the problem domain being studied.  If
the problem  is strictly numerical, then list-structure is overly general.
 If simple string manipulation is sufficient,
then list processing also may be too general. There are many applications
of list processing which are so  sufficiently well#behaved that 
complex devices like garbage collectors are unnecessary.
However, 
understanding the programming art in 
a rich environment  such as  LISP, prepares the programmer
 to  apply these
techniques  in  a  meaningful  way.    Many  times  a   
representation in  LISP  is all  that is  needed; a "throw-away
implementation" may answer the question.   A clean
representation  with comprehensible algorithms is developed.
Once 
  %6a%*  representation  is developed, it  is  easy to  get  better ones. 
		

.SS(Vectors and Arrays,,P137:)
.BEGIN TABIT2 (10,40);CENTERIT;FILL;
.FP
%2Vectors%*.##Vectors, also known as  one-dimensional arrays,  are  usually
stored sequentially in memory.  Simple vectors are usually stored one
element to a  memory location though  this is not  a necessity;  for
example, a vector representation of  a complex number may be   
 stored as pairs of cells.
 If vectors of nonhomogeneous data modes are contemplated,
each element would   include type  information.  
Also, we have seen a  representation 
 of  a stack as  a (sequential) vector  with access  
 made via a global pointer to the vector.
In any  case,
most languages make some restrictions on the behavior of vectors such
that efficient accessing of elements can be made.  
Vectors are an attractive representation when the size of data objects
will not vary. Given such a static behavior, machines can perform
access and updating of the elements rapidly.
.pt24;
.FP
%2Arrays%*.##Arrays are  vectors which allow vectors as elements.  
For example, a two-dimensional array is a vector, whose elements
are vectors of individuals. We will restrict attention to 
array whose elements are all of the same dimensions;  efficient
representation of more general
arrays, called %2ragged arrays%1, will  be examined in  {yonss(P224)}.
We will  restrict  our attention  further to  
two-dimensional   arrays, though  
most  of the  discussion generalizes  very
naturally.   
Since
most  machine memories are organized as linear  devices, we map  arrays onto a
linear representation.   
A  common implementation stores the array by rows; that
is,  each  row is stored sequentially - first, row 1; then  row 2,#...
and so on.
A  simple  calculation  finds the
location of an arbitrary element, A[i;j], given the location  of
the first  element  A[1;1] and  the length of each row   of the
array.  For an array A[1:M; 1:N],
.EQ;
loc[A[i;j]] = loc [A[1;1]] + (i-1)*N + (j-1)
.pt18;
.FP
In languages like Fortran which require that the size of the array be
known at compile-time  the compiler can generate the accessing code
 as references to specific memory locations.  
Languages, like Algol 60  and some versions of LISP,
 allow the size of an  array to be determined at run-time.
Algol 60, for example, requires the declaration of the 
type (real, boolean,  etc.) of  the array and  specification of the number  of
dimensions  in the  array,  but the 
 size specification of each dimension can be postponed until run-time.  
To implement this flexibility,
a %2dope vector%1 is  introduced. 
A ⊗→dope vector↔←    is  a header or %2descriptor%1 associated with
the area containing the actual array elements. The information in the dope vector
tells the functions  which access the array  how to treat the data.
Type and dimensionality are typical entries in dope vectors.

The compiler can determine  the size of
the dope vector, but   cannot  determine its contents.  
The dope vector is filled in
when the array declaration is executed;
at that time the array bounds are known.
The
compiler  cannot   allocate  space  for  the  array  elements;  the
allocation must  be  done at run-time.  
 At that
time we allocate space  and complete the dope vector. All
references to array elements must use the dope vector.

Assume that
the  array elements are stored  by rows.  Look  at the calculation of
the location of element  A[i;j].   For specific execution of an  array
declaration much  of this information  is constant; the  location of
the array  elements, in particular, A[1;1] and the number of columns 
N are fixed.  
.Group;
Thus we rewrite the calculation as:
.PT18;
\###constant part\variable part

\[loc [A[1;1]]-N-1]#######+\######(i*N+j)
.pt18;
The constant  part is stored  in the  dope vector.   When we wish  to
reference  an element A[i;j] we  need only compute  the variable part
and add it to the constant part.
.APART

The dope vector for A [1:M; 1:N] perhaps might contain
.BOXA
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
   		   ⊂ααααααααααααααα⊃
		   ~      2        ~
		   εααααααπααααααααλ
		   ~   1  ~    M   ~
		   εααααααβααααααααλ
		   ~   1  ~    N   ~
		   εαααααα∀ααααααααλ
		   ~ constant part ~
		   εαααααααααααααααλ
		   ~    . . .      ~
		    array elements
		   ~    . . .      ~
		   %ααααααααααααααα$
.END
.BOXB
.FP
There is another scheme  for storing arrays which is  used in
some of  the Burroughs machines#(⊗↑[Org#71]↑,#⊗↑[Dor#76]↑). 
Each row  is stored sequentially and
access  to  separate  rows  is   made  through  a  device  called   a
%2⊗→mother-vector↔←%*.   The mother-vector is  a vector of pointers  to the
rows.  
.GROUP;
Thus,
.BOXA

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "[" FOR "\";
.TURN OFF "\";
.TABS 57,66;
.KRK
                                                  
       ⊂ααααααα⊃    ⊂αααααααααααααααααα     [αααα[⊃
       ~    #ααβααα→~ A∞411∞*  A∞412∞*       ...[A∞41N∞*[~
       εαααααααλ    %αααααααααααααααααα     [αααα[$
       ~    #ααβαα→  . . .
       εαααααααλ
          . . .

       εαααααααλ    ⊂αααααααααααααααααα     [αααα[⊃
       ~    #ααβααα→~ A∞4M1∞*  A∞4M2∞*       ...[A∞4MN∞*[~
       %ααααααα$    %αααααααααααααααααα     [αααα[$
.END
.BOXB;
.APART

Notice that the accessing computation is very  inexpensive.⊗↓However access
by array columns can be expensive. If each row is on a separate
page in the machine, the access overhead can be substantial.← 
 On a  virtual memory machine
this array organization can be helpful;
 all rows  need not  be in memory  at once.  
 If an access to a row not in
core  is made, a "page#fault" is raised; the  monitor brings the row
into memory and the computation continues.   The mother-vector scheme
generalizes    to   multidimensional arrays, and  can   also be  used  in
conjunction with a dope vector.

.END
An implementation of an array facility in LISP might include
a declaration:

.DEF
⊗→%3array%*↔←%3[%1<identifier>;<type>;<bounds>; ... ;<bounds>], where the
identifier names the array; the type could be numeric or S-expr; and  finally
a declaration of upper and lower bounds for each dimension would be needed.
%3array%* is a special form whose effect is to make the array name a %3SUBR%*,
whose code is the calculation of the dope vector. Thus,
.BOXA
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
              ⊂ααααααπαααα⊃    ⊂ααααααααααααα⊃
              ~ SUBR ~  #αβ→αα→~ dope vector ~
              %αααααα∀αααα$    ~ calculation ~
			       εαααααααααααααλ
			       ~    array    ~
			       ~  elements   ~
			       	    . . .
			       %ααααααααααααα$
.END
.BOXB
.FP
If we are to store S-exprs in the array, then the ⊗→garbage collector↔← must
be able to mark the entries. This is the reason for including type information.

When an array element is to be referenced, the subscripts are evaluated
(since the array name was declared as a %3SUBR%*) and the dope vector code
is executed. That computation results in a reference to the appropriate cell.

.GROUP;
We also must be able to store information in the array.
.DEF
%3store[%1<name>[<subscr>; ... ;<subscr>];<value>] : ⊗→%3store%*↔← is a special form
whose effect is to store the value of <value> in the designated array element.
.APART

We have discussed storage allocation  and accessing of array elements.
Important distinctions in language design appear in discussing
deallocation of array space.  Typical Algol-like languages impose
a stack discipline on storage management. This imples that the arrray
elements  may be allocated in the run-time stack. It also implies that
the elements become inaccessible once the block which allocated
that array has been exited.  This is implemented by popping the array from
the stack. There are two ways for a language to  
assure that no  references  to "inaccessible" elements can occur 
(such references are called %2dangling references%1.)
Either restrict the semantics of the language such that no
such references can occur (Algol 60), or allow constructs which %6may%1
cause dangling references, but declare any 
occurrence  to be  an error (Algol#68).

LISP-like languages suppose that data structures
are to be retained as long as they are accessible; that treatement is 
also given to LISP arrays. Therefore arrays are allocated and deallocated
in a manner similar to the %3cons%1 operation for S-exprs;  sequential blocks
are maintained  in a free list; we will say more about this in {yonss(P224)}.

The two management philosophies for deallocation  of data structures
are characterized as the %2deletion strategy%1 and the %2retention strategy%1;
see ⊗↑[Ber#71]↑. 

.CENT(Problem)
.NL
1.##Implement a stack in LISP first using lists or dotted pairs, then using
an array. Include implementations of the stack operations.
.SS(Strings and Linear LISP,string processor,P27:)
.BEGIN TURN ON "←";
.FP
On {yon(P216)} we discussed one representation for LISP print names:
a linked list of full words; each 
full word contained a segment of  the
atom name. Print names are a special instance of a data structure
named strings; our use of strings in LISP  has been restricted to
manipulating string constants. In this section we will discuss alternative
representations for strings, and discuss further operations on string objects.
Most production LISP systems have a comprehensive set of string operations.
As with numbers and vectors, string operations could easily be 
represented as operations on S-exprs; however it is frequently more efficient
to represent strings as a separate abstract data structure.

Each string object is a sequence of characters. The elements
of a string may not be strings; this is the essential difference
between sequences and strings.
That simplification of  data structures introduces some different
aspects of storage management. It is these issues which we will emphasize
in this section.

The  primitive string manipulations  --#constructors selectors, recognizers, 
and equality#-- are similar to those for sequences.  Therefore we use
LISP M-expression syntax when describing the operations;
for that reason we call the string language, %2linear LISP%1.
The implementation
 allows the creation of strings
of arbitrary length; it allows the generation of new
strings and the  decomposition of existing  strings.  Since 
arbitrary length  strings are  to be created, an organization similar 
to  free space  will be used. The storage area for strings will be called
%2string space%1.

String space is a linear sequence of cells; each cell can
contain   one character.   A string  will be  represented as  a
sequence of contiguous character cells.  
The  value of a string variable will be
represented as a pair, containing character count and a pointer to the beginning
of the character sequence in string space.

.GROUP;
Thus,

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.BOXA
.KRK
  ⊂αααπααα⊃
  ~ 3 ~ # ~
  %ααα∀αβα$
        ~
        %α→ααα→⊃
               ↓
    . . .ααααπαααπαααπαααπαααπαααπααα . . .
	       A   B   B   1   D   . . .    string space
    . . .αααα∀ααα∀ααα∀ααα∀ααα∀ααα∀ααα . . .
.BOXB
.END
.fp
encodes the string %3ABB%*.
.APART

.BEGIN TURN OFF"←";
.fp
 There are  two    primitive selector functions: ⊗→%3first%*↔← and
⊗→%3rest%*↔←.
.END

.GROUP;
.DEF
%3first[x]%*  is  the first
character of the string represented by %3x%*.  %3first%* is undefined for the
empty string,#%7e%1. For example,

.EQ
%3first[ABC]%1 is %3A; first[%7e%3]%1 = %9B%1 

.APART
.GROUP;
.DEF
%3rest[x]%* is  the string  of characters  which remains  when the  first
character of  the string is deleted.  %3rest%* is also undefined for the
empty string. For example,

.EQ
%3rest[ABC]%* is %3BC%*
.boxb
.APART
.GROUP;
.fp
There is one constructor primitive.
.DEF
%3concat[x;y]%* creates a new string.   %3x%* is a character; %3y%* is
a  string. %3concat%* forms  a string  consisting of the  concatenation of  
 %3x%* with %3y%*.  For example,

.EQ
%3concat[A;BC]%* is %3ABC%* 
.APART
.boxb
.GROUP;
.fp	
There are two  primitive recognizers:

.boxa
.begin nofill;
.krk
⊗→%3char%*↔←%3[x]%1:  is %3x%* a single character?
.pt2
⊗→%3null%*↔←%3[x]%1:  is %3x%* the empty string?
.END

.APART
.GROUP;
.fp
.begin tabit1(27)
For example:\%3char[A] %1is %et%1
\%3char[AB] %1is %ef%1
.END
.pt18
.APART
.Fp
Finally, we include a version of the 	equality predicate which
will determine if two characters are identical.
.begin tabit1(22);
\%3x ⊗→%3=%*↔← y%1:  are %3x%* and %3y%* the same character?
.pt2;pt2
\%3AB = AB %1is %9B%1
.end
.boxb
.FP
The implementation  of these string primitives is  less complex
than that of LISP primitives.
%3first%* generates a character  count of one and a  pointer to the
first character of the parent string.
%3rest%* generates a character count of one less than that of the
parent and a pointer to the second character of the parent string.
Therefore, this implementation shares substrings, just as %3car%1 and %3cdr%1
share substructure.

The implementation of the recognizers and the equality predicate is straightforward.
We will blur  the distinction between characters
and strings  of length one.  Thus %3char%*  need  only check the character count.
%3null%* gives %et%1 if the count is zero.  
To implement equality, we note that characters are not stored uniquely,
   so  we must  make  an  actual character
comparison.

As with full LISP, the implementation of the constructor requires more
care. Since our implementation requires that string components be contiguous,
we must copy the arguments to %3concat%1.
To evaluate %3concat[x;y]%1, we copy %3x%*, then  copy  %3y%*
so that %3y%1 follows %3x%1 in free string space; we
generate  a character  count of  one plus the count of  %3y%*,  and
generate a pointer to the copy of %3x%*.  The copies are
made in the free string space in a manner similar to that used in %3cons%1.

The  storage management is somewhat different from that of  
 a simple LISP implementation.
Since the copying operation within %3concat%1 allocate space, we must
include some method for deallocating space. Though simpler methods
may suffice we us a garbage collector.⊗↓Since string operations are quite
well-behaved, a reference counter could be used. We use a garbage collector 
for its elegance and its
pedagogical value for the next section.←
   The  marking phase  is much  simpler than that for LISP; it is not
recursive. We use  the
descriptor in the symbol table to mark each character string.
However, we cannot stop marking simply because we
have encountered a previously  marked character. Since we are sharing
substrings, we must visit each  character in the string item.

The sweep
phase needs to be  more comprehensive for  string collection.   Since
strings are  stored   sequentially, a  fragmented string
space    is  of  little  use.    We  must compact  all  the
referenceable strings into one end of string space,  and free  a linear
block for the new free string space.  Since we are sharing substrings,
a  little care  must  be exercised.    When  we move  a  string,
 the descriptor of any variable referencing any part of that
parent string must be changed to reflect the new location.  So before
we begin the relocation of strings, we sort the string descriptors on
the  basis of their  pointers into  string space.   We then recognize
each parent string, moving it down into freed  locations and update
the address  pointers in the descriptors of any  substrings.  
Eventually all strings will  be compacted;   the
string space  pointer can   be set  and the  computation   continued.
Next, we adapt the
compacting garbage collectors   for use in LISP.

.END
.SS(A Compacting Collector for LISP,,P291:)
.P173:
.FP
We can combine the simplicity of the original mark-sweep garbage collector
with the sophistication of the collection phase of string 
garbage collector and 
produce a  compacting garbage collector for LISP.

There are several motivations for compacting storage. First, besides making the
%6active%* storage contiguous, we also make the %6free%* locations contiguous.
Thus the free lists can be handled as vectors rather than as lists. 
This simplifies storage allocation: to allocate the next
free element, take the next element in the free space vector.

Another reason for 
concern with compacting is related to hardware. If the underlying machine is
using a paging scheme, then we can try to minimize page-faults by keeping
the LISP structures localized. In the worst case, we could have every element
of a list on a separate page; this could require that the memory manager
retrieve a new page for every 
reference.⊗↓Very little empirical work has been done on the actual storage
requirements and running environment of LISP. 
A start is made in ⊗↑[Cl#76]↑; much more should be done.← 
However,
we cannot restrict the operations of the LISP programmer. The underlying hardware
%2must%* be invisible to the user. The next best thing is to try to keep
the structures as local as possible; compaction of spaces is a first attempt
at this. We will discuss other lower-level tricks later.

Compaction is important in languages other
than LISP.
 If the language allocates storage in a manner similar to LISP
but the constructs allow %6different%*-sized blocks to be specified
 (a string processor is a simple example), then
compaction may be necessary.⊗↓As we shall soon see, the rationale is
applicable in LISP as well.← 

Granted that compaction is a worthwhile endeavor,  we proceed.
We can't simply mark all the
active cells and then move them into unmarked cells to compact the space.
We must also maintain the original topological relationships between the elements.

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN OFF "\";TURN ON "?" FOR "\";
.TABS 45;
.KRK
.BOXA
     ⊂αααπααα⊃     ?     ⊂αααπααα⊃     
204  ~204~204~∞1 is not the same as ∞b   ?100  ~204~204~     
     %ααα∀ααα$     ?     %ααα∀ααα$                
.BOXB
.END
                                                 
Besides moving the cells, we must also update each reference to a moved location:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN OFF "\";TURN ON "?" FOR "\";
.TABS 45;
.KRK
.BOXA
     ⊂αααπααα⊃     ?     ⊂αααπααα⊃     
204  ~204~204~∞6 is∞1 the same as ∞b   ?100  ~100~100~     
     %ααα∀ααα$     ?     %ααα∀ααα$                
.BOXB
.END
.FP
To handle these problems, we expand the sweep phase into two phases:
the %2relocating%* phase and the %2updating%* phase.

The relocating phase begins after all active structure is marked.
Assume   we are to compact all active structure %6down%* to the
bottom of the space. First we initialize two pointers: a %2free%* pointer
to the lowest cell in the space; and an %2active%* pointer to the top of the space.
We move the active pointer down until we come across a %6marked%1 location;
we move the free pointer up until we locate an %6unmarked%* cell. 
We want to  move that marked cell down into the free location; but
we must also supply enough information to maintain the original relationships
in the transformed structure. The cell we move  may reference other cells
which will be moved.  

.BEGIN GROUP;
Here's a picture:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.krk
.BOXA
 		     . . .
		   ⊂αααπααα⊃     
	       77  ~ 65~402~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃     
	      100  ~   ~   ~ ←∞1free pointer∞*
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃     
	      155  ~   ~   ~ 
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃     
	      204  ~402~ 77~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃ 
	      402  ~204~402~ ←∞1active pointer∞*
		   %ααα∀ααα$ 
		     . . .
.BOXB
.END
.END
.FP
Cell %b77%* was active so we left it alone; it references cell %b65%*,
which  has already been visited; and also references cell %b402%* which is about
to move. We move the contents of cell %b402%* into cell %b100%*, and to
let everyone know where the contents has gone, we leave a forwarding address of
%b100%* in location %b402%*.
.BEGIN GROUP;
Thus,
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
		   ⊂αααπααα⊃     
	      100  ~204~402~ ←∞1free pointer∞*
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃ 
	      402  ~   ~100~ ←∞1active pointer∞*
		   %ααα∀ααα$ 
.BOXB
.END
.END
.FP
The active pointer, having writ, moves on; 
it skips over any unmarked cells, looking for the next marked
location. Assume the next marked
location is %b204%*. It stops there and waits for the free pointer
to discover that location %b155%* is the next free location.
In its search the free pointer will  skip over any marked cells.
The same relocation operation occurs: the contents of %b204%* is moved
to location %b155%*, and the forwarding address of %b155%* is placed in location
%b204%*. The process continues until the two pointers collide.
Call that collision point %2col%*.
When they  meet, all locations above %2col%*  either will be free or will
contain forwarding addresses. All addresses, %2col%* and below, will contain
marked words or relocated cells.
We are now ready to enter the %2update%* phase.

.BEGIN GROUP;
Here is the picture:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.krk
.boxa
 		     . . .
		   ⊂αααπααα⊃     
	       77  ~ 65~402~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃     
	      100  ~204~402~ 
		   %ααα∀ααα$                
                     . . .
		   ⊂αααπααα⊃     
	      155  ~402~ 77~ 
		   %ααα∀ααα$                
                     . . .   ← ∞2col∞*
		   ⊂αααπααα⊃     
	      204  ~   ~155~     
		   %ααα∀ααα$                
		     . . .
		   ⊂αααπααα⊃ 
	      402  ~   ~100~ 
		   %ααα∀ααα$ 
		     . . .
.boxb
.END
.END
.FP
We examine the initial segment of our space from the bottom to %2col%*
looking for any references to that area %6above%* %2col%*. A reference
to that area must be changed. What is found in the referenced cell is not
the desired information, but is the forwarding address of the desired information.
What to do is obvious: tell the sender what the change of address is.
Thus the %3cdr%*-part of cell %b77%* becomes %b100%*; the %3car%*-part doesn't 
change. Cell %b100%* refers to two relocated cells; we find their forwarding 
addresses, and cell %b100%* becomes 
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
		   ⊂αααπααα⊃     
	      100  ~155~100~ 
		   %ααα∀ααα$                
.BOXB
.END
.FP
Similar treatment is given to cell %b155%*, modifying the %3car%*-part.
When all cells below %2col%* are updated, the garbage collection is finished.
The cells above %2col%* are all available for the free-list.

.GROUP;
.CENT(Problems)
.NL
1.##Is %2col%* in the free space list after the update phase?
.NL
2.##Write a LISP algorithm for compacting garbage collection in LISP.
.APART;
.SS(Bit-tables,,P263:)
.FP
In the marking phase of a garbage collector it is
necessary to  record the visitation of each word.
Frequently it is not possible to place a mark in the  actual word.
This  might occur for  several  reasons: 
.SKIP 1
.NL;
%21.%1##For  a word in  FS, there is  no
room if each  word contains exactly two addresses.
.NL;
%22.%1# For  a word 
in  FWS,   the information   would be changed if we modified a bit.
.NL;
%23.%1# In structures, more complex than dotted pairs, there may not be room for
marking bits.
.NL; 
%24.%1# If a mark bit is assigned in each word, then  the initialize phase
requires that we visit each  word. This violates 
"locality of reference".⊗↓Locality  refers to the  relative distance between
memory locations assigned in a particular structure. In some machine organizations,
memory is divided into "pages" of a relatively small size. There is significant
overhead involved in crossing page boundaries. Therefore memory referencing
which entails many scattered references is said to violate "locality of
reference."← 
.pt24;
.FP
An alternative solution  designates a separate section of memory called a
%2bit-table%1. The bit-table is a sequence of binary flags such that there
is a one-to-one correspondence between a flag and a markable memory location.
Whenever we wish to record the visiting of a word, we set the corresponding flag
in the bit table.
A bit table is represented as a sequence of machine locations with several
flags represented in each word. 
The initialization  phase is improved since it is faster to initialize a whole 
table rather than initialize single bits in  separate words.
The mark phase is rapid if there 
 is a simple calculation to relate each
bit in a word with its corresponding  markable location.


.SS(Representations of Complex Data Structures,,P138:)
.FP
In our discussion of abstract context-free data structures in {yonss(P287)},
we isolated three kinds of  structures:  

.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 ...  %eD%41%1  
e.g.,←<seq> ::= %2(%*<seq elem>%2,%* ..., <seq elem>%2)%*
.END

.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 | %eD%42%1 | %eD%43%1  
e.g.,←<seq elem> ::= <indiv> | <seq>
.END

.BEGIN CENTERIT;
←%eD%1 ::= %eD%41%1 %eD%42%1 %eD%43%1 ... %eD%4n%1
e.g.,←<sexpr> ::= %3(%*<sexpr> . <sexpr>%3)%*
.END
.PT18;
.FP
We have discussed the behaviorial characteristics of algorithms
which operate on these structures. Now we wish to examine
the storage structure aspects of these data structures.

Corresponding to these three
data structures are three "natural" storage representations.
By "natural" we mean that even though all these structures
can be represented as LISP S-expressions, for example, there
are representations which might better suit the operations
which we expect to perform on those structures.
Since "natural" is not a well-defined term, we will 
clarify its meaning using examples of    context-free data structures.

The first type of data structure given above, maps naturally onto
a representation which contains information that the object is of
type %eD%1 and contains space for the storage instance of this data type.
Elements of type %eD%1 are homogeneous, being all of type %eD%41%1;
however, the size of a type %eD%1 element is indefinite.
Depending on the operations which are to be performed on the representation,
either a list representation or  an array representation is reasonable
for the storage structure. Unless the operations are quite
complex, a sequential allocation scheme suffices.


The second type of data structure  is frequently represented as a pointer.
There really isn't any storage allocated for objects of this type.
Instances  which satisfy this equation have their storage
requirements set by one of the %eD%4i%1 alternatives.
We will discuss pointer manipulation in LISP in the next section.

This section will discuss the third abstract data structure.
The essential characteristic here is that instances of this structure
have a fixed number of components, and 
those components need not be of homogeneous type.
Those components are typically referenced by name.
These characteristics form a natural distinction between
this third class and  the first class, even though an appropriate encoding
would make it possible to represent either class in the other.

.GROUP;
For example, in equations like 

.BEGIN CENTERIT;
.PT18;
←<sexpr> ::= %3(%1<sexpr> . <sexpr>%3)%1  

.ONCE INDENT 0;
or ←<form> ::= <function>[<arg-list>] 
.END
.PT18;

.FP
we reference components by  selectors like %3car%1, %3cdr%1, %3func%1,
and %3arglist%1. 
.APART;

LISP  represents instances of the above equations
as objects of the first  and second types of data structure: 
variable-length lists of pointers.
As a result, we have thought of these selectors as operations
 which might require some nontrivial amount of computation
to discover the desired component, but  as we saw in
{yonss(P290)} what is algorithm and what is data depends on your point of
view. For example, we  could  think of a dotted pair as an array which has
two components, one referenced by %3car%1, one referenced by %3cdr%1.
We say "array,"  since the number of components is known; but the
element references are done by nonnumerical names.

The natural storage requirements for such objects imply a fixed amount
of storage. That storage can be sequentially allocated since the 
size of the  element will  not vary. The representation must also
encode the scheme for associating external selector with internal 
representation.  
.GROUP
For example,

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK;
.BOXA

     	⊂αααααπααααα⊃
        ~ CAR ~     ~
	εαααααβαααααλ
        ~ CDR ~     ~
       	%ααααα∀ααααα$  		  
.BOXB
.END
.APART
.FP
Notice that the  array-referencing mechanisms have  to solve a similar
problem. However, array representation is such that the dope vector
can perform a %6calculation%1 to locate the element.

The storage element which we are developing is called
a %2⊗→record↔←%1 (⊗↑[Pop#68]↑), or a %2structure%1 (⊗↑[Alg#75]↑,
⊗↑[EL1#74]↑), or a %2plex%1 (⊗↑[Han#69]↑).⊗↓A similar device, called a 
%2hunk%1,  has been implemented
in MacLISP ⊗↑[Ste#pc]↑.← 

.BEGIN TURN OFF "←";
Besides the usual constructors, selectors and recognizers, records
may be supplied with a function 
to modify components of a structure. This function is called  an 
%2updater%1.
Just as we can write %3A[43]#←#56%1 where %3A%1 is an array, an
updater function would implement a statement like %3car[x]#←#(A#.#B)%1.
.END
Updating of simple variables is called assignment.
A discussion of "updating"  of more general
data structures requires a deeper understanding of
the implementation and storage  structures. In the  case of
LISP, it requires a discussion of 
pointers. That is  the topic of the next
section.
.SS(%3rplaca%* and %3rplacd%*,,P171:)
.BEGIN TURN ON "←";
.FP
The discussion in {yonsec(P188)} developed an implementation of the
LISP operations in terms of  the manipulation of pointers.
Those manipulations allowed the 
creation of new structure or allowed
sharing of an existing structure. None of these operations involved the
modification of an existing structure.
In this section we will discuss 
some LISP  coding
tricks which do involve modification operations.
.BEGIN GROUP;
First, consider 
.boxa
%3←append <= λ[[x;y][null[x] → y; %et%3 → concat[first[x];append[rest[x];y]]]]%1
.boxb
.END
.FP
This function copies %3x%* onto the front of %3y%*.⊗↓Note: %3y%* is not copied.←
Or recall the %3subst%* function: it generates  a copy with the correct
substitutions made; the substitutions are not made in the original S-expr. 
Since it is the constructors which carry out the copying operations, and since
the application of a constructor may initiate  the expensive operation
of garbage collection, we should examine the possible ways of reducing
copying.

Consider the expression %3append[(A B C);(D E)]%*.  It appears that we
could get the  effect of %3append%* by %3rest%*-ing down the list
%3(A B C)%* until we found the terminator; then we could replace 
that terminator with a pointer to the list %3(D E)%*.  Thus,


.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA
⊂αααπααα⊃  ⊂αααπααα⊃  ⊂αααπαα⊃     ⊂αααπααα⊃  ⊂αααπαα⊃
~ A ~ #αβα→~ B ~ #αβα→~ C ~≤'. . .→~ D ~ #αβα→~ E ~≤'~
%ααα∀ααα$  %ααα∀ααα$  %ααα∀αα$     %ααα∀ααα$  %ααα∀αα$
.BOXB
.END
.FP
The resulting structure does indeed look like one  we would have
obtained from %3append%1. The  operation we want to perform  
%6modifies%1 the existing structure, instead of  %6copying%1 it as %3append%1
would have done. Such modifications can cause serious difficulties.
.GROUP;
Let us call the modifying version of %3append%1, %3nconc%1; and consider the
execution of the following sequence of statements:

.BEGIN TABIT1(30);TURN OFF"←";SELECT 3;

%1first%3\i ← (A B C)

%1then%3 \j ← (D E)

%1and finally,%3\k ← nconc[i;j]
.pt18
.END
.APART
.FP
After execution of the third statement, %3k%1 would have 
the expected value %3(A#B#C#D#E)%1.
However %3i%1 would  %6also%1 have this value since we
modified the structure assigned to %3i%1.
Also, any value
which was sharing  part of the structure  of %3i%* will also  be changed.
%3nconc%1 is a pointer modification procedure; its effect can be quite far-reaching
and unexpected.  Exclusion of such features is one solution. However a programming
language %6is%1 a tool, and used carefully, such features are valuable
for decreasing storage requirements  and execution time.
Inclusion of such features must be done with care, however. The chance for 
inadvertent application must be minimized.

 With the preceding adminitions, we introduce the LISP pointer modification
primitives. Their appearance at this position in this text 
indicates that such operations
are not critical to an understanding of programming languages, and also that
such features should not be used without a reasonable understanding of that
language.

Pointer modification functions for LISP are defined in terms of two primitive
operations; %3rplaca%1  replaces the %3car%1 pointer; %3rplacd%1 replaces
the %3cdr%1 pointer.

.BEGIN GROUP;
The expression  ⊗→%3rplaca%*↔←%3[x;y]%*  replaces the %3car%*-part of %3x%* with %3y%*.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.KRK
.BOXA


   ∞3dest∞*	     		       ∞3env∞*
    ~				   ~
    ~	⊂ααααααα⊃		   ~	⊂ααααααα⊃
    %→  ~     #αβα⊃ 		   %→   ~     #αβαα###→
	εαααπαααλ ↓			εαααπαααλ 
          # # #	 ←$			~ x ~ # βα→α⊃
	~   ~   β→ - - →⊃		εαααβαααλ   ~
	εαααβαααλ  	  		~ y ~ #αβ→⊃ ↓
        ~   ~   ~	↓               %ααα∀ααα$ ↓ ~
          # # #	      ⊂αααπααα⊃	        ⊂ααααααα⊃ ~ ↓
       	εαααβαααλ     ~ # ~ ? ~  ⊂ - - →~   ?   ~←$ ~
        ~   ~   ~   ⊂→%α|α∀ααα$	 |      %ααααααα$   ~
       	%ααα∀ααα$   ↑	% - → - →$                  ↓
		    %←ααα←ααααα←αααα←ααααα←ααααα←ααα$

.BOXB
.END
.LE(6,Algorithm for %3rplaca%1)
.END
.FP
The  AMBIT/G  description of %3rplacd%1    was given on {yon(P246)}.


.BEGIN GROUP
.fp
Now %3nconc%* can be defined as:⊗↓Since we're really involved with the representation
we use %3car%* and %3cdr%* rather than %3first%* and %3rest%*.← 
.BEGIN SELECT 3;TABIT2(16,19);TURN OFF"←";
.KRK
.BOXA

nconc <= λ[[x;y] prog[[z]
\\[null[x] → return[y]];
\\z ← x;
\a\[null[cdr[z]] → rplacd[z;y]; return [x]];
\\z ←cdr [z];
\\go[a] ]]

.BOXB
.END
.END

.BEGIN SELECT 3;TABIT2(18,28);TURN OFF"←";GROUP;
.KRK
.BOXA
%1Consider:%*\prog[[x]\x ← (NOTHING CAN GO WRONG);
\\rplacd[cdddr[x];cddr[x]];
\\print[x]]
.BOXB

.END
.FP
This expression will  generate  a circular list.
Circular lists cannot be generated in LISP without functions like
%3rplaca%* and %3rplacd%*.
See the problem on {yon(P55)}.
In  general,  to circularize  a nonempty
list, %3x%*, %3rplacd[last[x];x]%* suffices where

←%3last <=λ[[x][null[cdr[x]] → x; %et%3 → last[cdr[x]]]]%1


.BEGIN GROUP
←%2Problems%*
.NL
1.##What is the effect of evaluating %3rplacd[x;cdr[x]]%*?
.NL
2.##Recall the problem on ⊗→hash consing↔← on {yon(P154)}.
There we were contemplating unique storage for all S-exprs.
Can such a scheme be reconciled (efficiently) with functions like
%3rplaca%* and %3rplacd%*?
.NL
3.##It has been pointed out that %3rplaca%1 and %3rplacd%1 are
closely related to assignment statements ⊗↑[And#76]↑. 
Extend one of our evaluators to recognize expressions like:
.BEGIN CENTER;TURN OFF "←";
.pt2;pt2
%3car[%1<form>%3] ← %1<form>
.END
.ONCE INDENT 4;
as abbreviations for:
.BEGIN CENTER;SELECT 3;
rplaca[%1<form>; <form>%3]%1
.END
.ONCE INDENT 4,4;
This extension of assignment is obviously not restricted to
%3rplaca%1 but could allow arbitrary forms on the left-hand
side of an assignment.
.END
.END


.SS(Applications of ¬3rplaca¬* and ¬3rplacd¬*,,P155:)
.BEGIN TURN ON "←";
We begin with rather simple examples. Consider the  problem of inserting
an element into the middle of a list.  For example let %3x%* be the list 
%3(A B C)%*.  If we wished to insert an atom, say %3D%*, between %3B%*
and %3C%*, we could perform 

.pt18
.BEGIN CENTER; SELECT 3;turn off"←";
x ← cons[car[x];cons[cadr[x];cons[D;cddr[x]]]] 
.END
.PT18
.FP
We recopy  the initial segment
of %3x%*, adding %3D%* at   the appropriate place. 

In appropriate  circumstances, we can use  %3rplacd%1 to insert elements
into lists, using fewer %3cons%1 operations.
For example, given the list %3(A B C)%* with pointers  %3x%* and %3y%*  into it
as follows:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 12,12
.BOXA
x	       y
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ A ~ #αβαα∀αα→~ B ~ #αβαα→~ C ~≤'~
    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$

.BOXB
.END
.group
.fp
we could insert the element  %3D%*  %6after%* the first element in %3y%* by 
###%3rplacd[y;cons[D;cdr[y]]]%*,##giving:⊗↓Notice 
that %6one%* application of %3cons%* is unavoidable.← 

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 8,8;
.BOXA
x	       y
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃        ⊂αααπαα⊃
%αα→~ A ~ #αβαα∀αα→~ B ~ #αβα⊃    ⊂→~ C ~≤'~
    %ααα∀ααα$      %ααα∀ααα$ ↓ 	  ↑ %ααα∀αα$
			     ↓    ~
		        ⊂αααπααα⊃ ↑
			~ D ~ #αβα$
			%ααα∀ααα$

.BOXB
.END
.FP
But note that the value of %3x%* has also been changed, and
any S-expr sharing the list %3x%* or %3y%* as a sublist has also been affected.
.APART
.APART
.APART
We could  delete the element %3D%1 by
.begin center;turn off"←";
.boxa
%3x ← cons[car[x]; cons[car[y];cddr[y]]]%1
.boxb
.end

We can also use %3rplacd%* to delete %3D%1 without using %3cons%1;
we delete not the %6first%1 element of %3y%1, but the next element 
in %3y%* by

.EQ
%3rplacd[y;cddr[y]]%*
.PT18;
.FP
Similarly, we can use %3rplaca%* to modify an element in a list (or S-expr).
To change the first element in the list, %3y%*, to the S-expr  %3z%* use

.EQ
%3rplaca[y;z]%*
.PT18;
.FP

Notice that the uses of %3rplacd%* for insertion and deletion are couched in terms
of insert %6after%* and delete %6after%*, rather than insert %6at%* or delete %6at%*.
If you look at a diagram you will see why.

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 12,12;
.BOXA
x	       
~	       
↓   ⊂αααπααα⊃      ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ A ~ #αβααααα→~ B ~ #αβαα→~ C ~≤'~
    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$

.BOXB
.END
.fp
To delete the element %3B%* requires modifying the %3cdr%*-part of the predecessor
cell; a similar remark applies  to insertion at a specified cell. 
A simple, perhaps inefficient scheme,  to support such  modification
would be to start a second pointer
 from the beginning of the list, looking for the cell whose %3cdr%* pointed to the
desired spot; then make the modification.  

If these "modification-%6at%*" functions were to be performed very frequently,
then it might be worth starting %6two%* pointers down the list, one at %3x%*,
one, say %3y%*, at %3cdr[x]%*, as above.  Then testing could be done using %3y%*
and the modification   could be done using %3x%*. When we move %3y%* to
%3cdr[y]%*, we move %3x%* to %3cdr[x]%*.  If we wanted to modify
%6before%* rather than %6at%*, we could proliferate the "back pointers,"  but
 if this kind of generality is required a change of representation
is called for.  We might resort to the double-linking scheme introduced on 
{yon(P164)}; more complex representations are also discussed
in detail in ⊗↑[Knu#68]↑  Chapter 2.

A LISP 
implementation which stores p-lists as list structure would 
use %3rplaca%* and %3rplacd%* heavily; for example, functions
which modify properties on the p-lists would use these functions. Here are the
two p-list manipulating functions, ⊗→%3putprop%*↔← and ⊗→%3remprop%*↔←.

.GROUP;
.DEF
%3putprop%* was introduced on {yon(P52)}.
Recall that the effect of %3putprop%* is to attach an indicator-value pair
to an atom. If the indicator is already present, then we will simply change its 
value;  
if the indicator is not present, then we will add the indicator-value 
pair to the front of the p-list. 
In the definition %3n%* is an atom, %3i%* is an indicator, and %3v%* is 
the value to be stored.

.BEGIN SELECT 3;TABIT2(17,20);GROUP; TURN OFF"←";
.KRK
.BOXA
putprop <= λ[[n;v;i] prog[[m]
\\m ← cdr[n];
\a\[eq[car[m];i] → rplaca[cdr[m];v];return[v]];
\\m ← cddr[m];
\\[null[m] → rplacd[n;cons[i;cons[v;cdr[n]]]];return[v]];
\\go[a] ]]
.BOXB
.END
.FP
Note that extended conditional expressions are used in the definition.

.APART
.GROUP;
.DEF
%3remprop%* was also introduced on {yon(P52)}.
%3remprop%* is a predicate  used to remove attribute-value pairs from
the property list of an atom.
We will capitalize on the LISP "%3NIL%1-non#%3NIL%1" trick
for predicates and return the removed property value if one is found.
The following implementation of %3remprop%1 does that.

.BEGIN SELECT 3;TABIT3(15,18,50);GROUP;TURN OFF"←";
.KRK
.BOXA

remprop <= λ[[n;i] prog[[m]
\\m ← n;
\a\[eq[cadr[m];i] → return[prog1[\caddr[m];
\\\rplacd[m;cdddr[m]]];
\\m ← cddr[m];
\\[null[cdr[m]] → return[%ef%3]]
\\go[a]]]
.BOXB
.END
.APART
.FP
where %3prog1%1 evaluates its arguments from left to right and
returns the value of its first argument.

Applications of %3rplacd%*  occur inside ⊗→%3ratom%*↔← 
when p-lists are built and added to the object list.
On {yon(P160)}
we will develop a version of LISP's parser which uses pointer modification to
gain efficiency when building the internal representation.
Pointer modification is also used
inside the ⊗→garbage collector↔←; examine the %3sweep%1 phase of the collector
on {yon(P257)}.


.P161:
Finally,  pointer modification 
allows the construction of self-modifying programs.
This technique is similar  to the machine language tricks of self-modifying code
and should be used with similar care.
The freedom to hang yourself should not be construed as an invitation
to do so, but
it again points out the similarities of LISP to machine language and
highlights the differences between LISP and its contemporary high-level
languages.

LISP's  central processor %3eval%* operates by traversing and interpreting
the data structure representation of the program; that data structure is also
open for inspection by LISP's data structure manipulating functions.
Since we now have list-modifying functions, we could  modify a program
by changing its internal structure. Indeed we can write a program which
modifies its %6own%* structure. 

.BEGIN TABIT2(14,16);GROUP;TURN OFF "←";
Here's one:
.SELECT 3;
.KRK;BOXA
foo <= λ[[x] prog[[y;z]
\\z←1;
\\y←sixth[body[foo]];
\a\print[x];
\\rplaca[rest[y];z←add1[z]];
\\go[a] ]]
.BOXB
.END
.FP
The mystery created by %3y%* is a pointer into the representation of the
statement %3print[x]%*; that representation is %3(PRINT#X)%*. Therefore
the effect of the first %3rplaca%* is to change %3(PRINT#X)%* to %3(PRINT#2)%*.
Subsequent passes through the loop will change the statement to print 
%33, 4,%* and so on. 
There really isn't much that can be said about such a program.

.CENT(Problems)
.NL
1.##More on %3ratom%*.
Recall the discussion of %3ratom%* in {yonss(P13)} and {yonss(P14)}.
Now that you know about %3rplaca%* and %3rplacd%* write a more detailed
version of %3ratom%*.
.NL
2.##On {yon(P16)} and {yon(P23)}
we wrote  various styles of %3reverse%1.
All these functions used %3concat%*; however,  we should be
able to reverse a list without using %2any%* new cells.
Express this
algorithm as a LISP function. If you use %3prog%*, don't use any %3prog%*-variables.

.END
.<<NUMBERS>>
.NEXT PAGE
.SS(Numbers,numbers,P293:)
.GROUP;
.FP
In many implementations of LISP, numbers are stored  as very simple kinds of
atoms:  they 
are not stored uniquely, and do not  need print names. Most implementations
allow fixed- and floating-point representation; therefore, indicators
for these properties are needed.  Thus:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.BOXA
.INDENT 8,8;

fixed-point 1
~
↓
~  ⊂ααααααααπααα⊃   ⊂ααααα⊃ 
%→ ~ FIXNUM ~ #αβαα→~  1  ~
   %αααααααα∀ααα$   %ααααα$

floating-point 1
~
↓
~  ⊂ααααααααπααα⊃   ⊂ααααααααααααααααααααααα⊃ 
%→ ~ FLONUM ~ #αβαα→~ <machine rep  of 1.0> ~
   %αααααααα∀ααα$   %ααααααααααααααααααααααα$
.BOXB

.END
.APART;
.FP
The number is stored in FWS and the 
type  is indicated by a minimal property list.
This representation
is  expensive in space and adds significant overhead to the execution of 
arithmetic 
operators. Several techniques have been used to improve  LISP arithmetic.

Assume that the  addressing
space of the machine  is 2%818%* and that the usual  size of a LISP
memory image is N;   within the LISP  system, all 
references to memory locations  greater than N  are illegal.
We  will  use  these illegal addresses to encode  some of the
smaller positive and  negative integers, mapping  zero on the  middle
address, the  positive numbers to  lower addresses and  the negatives
onto  the  higher   addresses.    These  smaller  integers, called
%2INUMS%*,  are
represented by pointers outside of the normal LISP  addressing space.
This  trick can  considerably decrease  the storage  requirements for
applications
which use small  numbers extensively. 

.GROUP;
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.TURN ON "?" FOR "\";
.TABS 32;
.BOXA

	⊂αααααααααα⊃ 2∞818∞*
	~?~
	~?~
	~?~ m ∞1<∞* 0
	~?~
	~?~ m = 0
	~?~
	~?~ m ∞1>∞* 0
	~?~
	~?~
	εααααααααααλ N
	~?~
	~?~
	~?~
	~?~
	~?~
	~?~
	~?~
	~?~
	%αααααααααα$ 0

.BOXB

.END
.LE(6,Picture of INUM Space)
.APART
.FP
The INUM representation adds some complexity since
 the arithmetic operators now have to recognize these illegal pointers as
encoded numbers.

The MACLISP (⊗↑[Moo#74]↑) implementation uses a different representation for 
numbers.⊗↓Much care went into the  the representation of numbers
in MACLISP. That LISP system
 is used as the implementation language for MACSYMA 
(⊗↑[MAC#74]↑, ⊗↑[Wan#75]↑, ⊗↑[Mos#74]↑),
a large algebraic and symbolic mathematics system. MACLISP's efficient
number facilities, coupled with its  optimizing compiler,
have resulted in the production of compiled code which is more efficient
than that produced by DEC's FORTRAN compiler (⊗↑[Fat#73]↑).← 
In that implementation, two spaces are allocated for number storage:
FIXNUM space and FLONUM space. This  makes a more compact representation since
the type information is implied in the address of the object  rather than 
being explicitly
stored. To those basic spaces we add two temporary stack areas: FIXPDL and
FLOPDL. These areas are used for temporary arithmetic computation.

The temporary areas work in conjunction with a  type declaration option
used to aid  the  MACLISP compiler.
If we know that certain variables are %6always%1
going to be used as numbers in a particular  function, then  we can compile
better code. Assume %3x%1 and %3y%1 are to be used %6only%1 as FIXNUMs
within a function %3f%1; we would make such declarations for the 
MACLISP compiler
just as we can declare some variables as "special" to other
LISP compilers.
When we allocate space for %3x%1 and %3y%1, we allocate space on the
top of FIXPDL. Within %3f%1 the arithmetic operations use the hardware
arithmetic and reference the stack elements. The stack elements can be
passed to other arithmetic functions called within %3f%1, and no permanent
storage need be allocated in FIXNUM space until later.
The efficiency  of arithmetic operations is dependent on the existence
of special hardware instructions for such arithmetic.
However, special hardware also places limits on the arithmetic
capabilities of most languages:  arithmetic
is usually limited by the word size
of the machine.

 There are 
several   versions   of   LISP  which   will   automatically   change
representation  when faced  with overflow.   
This scheme is called %2arbitrary precision arithmetic%1 and
has been implemented for both fixed-point and floating-point
numbers. We will describe a representation for fixed-point numbers
called %2BIGNUMS%1; they could have the following structure:


.GROUP;
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.BOXA
.INDENT 10,10;

positive big number
~
↓
~  ⊂ααααααααπααα⊃   ⊂αααπααα⊃        ⊂αααπαα⊃
%→ ~ POSNUM ~ #αβαα→~ # ~ #αβα→ ... →~ # ~≤'~
   %αααααααα∀ααα$   %αβα∀ααα$        %αβα∀αα$
		      ↓                ↓
		     N∞40∞*                N∞4n∞*

negative big number
~
↓
~  ⊂ααααααααπααα⊃   ⊂αααπααα⊃        ⊂αααπαα⊃
%→ ~ NEGNUM ~ #αβαα→~ # ~ #αβα→ ... →~ # ~≤'~
   %αααααααα∀ααα$   %αβα∀ααα$        %αβα∀αα$
    	              ↓		       ↓
		     N∞40∞*                N∞4n∞*


.BOXB
.END
.LE(8,Structure of a BIGNUM);
.APART

.BEGIN GROUP;
.FP
The value of a BIGNUM is given by:
.EQ
%7b%1(N%40%1 + %7a%1N%41%*+ ... + %7a%8n%1N%4n%1)
.PT18;
.FP
where %7b%1 is either + or  - and 
%7a%1-1 is  the largest number  representable in one  machine word.
The translations   between BIGNUMS and the other numeric representations
are done automatically.

On most implementations of LISP, no attempt is made to store numbers uniquely.
Thus %3eq%1 will not work on numbers  other than INUMs; either 
%3equal%1 is extended for numbers
or a special equality predicate for numbers is  provided.
.END
.SS(Stacks and Threading)
.FP
Though  recursive algorithms are usually more illuminating than
their  machine-oriented counterparts, it is frequently more efficient
to encode those algorithms in manners which can take advantage of the
hardware. This section will discuss two techniques which "unwind" the recursion
and typically lead to faster execution.

Recall the marking phase of a garbage collector in {yonss(P146)}. There
we wrote %3mark%* as a recursive algorithm. We could equally well write %3mark%*
using  an explicit stack:
.BEGIN TABIT3(16,23,43);SELECT 3;GROUP;TURN OFF "←";
.boxa
.krk

mark <= λ[[tr]prog[[st]
\loop\[is_marked[tr] → go[chk_st];
\\ is_full_wd[tr] → markA[tr];go[chk_st];
.PT2
\\ is_free_wd[tr] →\st←push[cdr[tr];st];
\\\markA[tr]; 
\\\tr←car[tr];go[loop]];
\\ %Et%* → go[chk_st]]; ⊗↓%1This  branch of the conditional could be omitted
and the  effect would be the same.←%3
\chk_st\[null[st] → return[%et%*]];
\\tr←top[st];
\\st←pop[st];
\\go[loop] ]]
.boxb
.APART;
.group;

push <= λ[[i;st] concat[i;st]]
.PT2
top <= λ[[st] first[st]]
.PT2
pop <= λ[[st] rest[st]]
.boxb
.END
.FP
Notice that we  save only the %3cdr%*-node in the stack; even at that, the
stack grows proportionally to the depth of the tree being traversed. 
See the problem on {yon(P162)}.
The technique of using an explicit stack sometimes is more intuitive and 
sometimes will lead to faster execution.

The second technique is more  tricky but will lead to significant pay-offs
in execution time.⊗↓But there will be a proportional loss in clarity 
in the  code.←
The technique is called %2⊗→threading↔←%*.
The basis for threading is a desire to traverse tree structures in a more
efficient fashion than that typically available 
implicitly in recursion, or explicitly  via
stacks. Recall that on {yon(P164)} we surmised 
that %2⊗→double-linking↔←%* might be
advantageous in moving up and down the "spine" of a tree structure.
Double links would allow us to find the successors and predecessors of nodes
easily. However the extra link gives us no help if we wish
to descend into the substructure. It is this area to which threading addresses 
itself: descent into tree structure.

Examination of the new %3mark%* algorithm will reveal that for a
%6fixed%* tree and a %6fixed%* order of traversal;
 any two applications of marking will
have the same pattern of behavior. The order of visitation to each node
will be the same, but more importantly,
the dynamic changes in the state  of the stack
will %6also%* be the same. Instead of replicating the portion of the stack,
it might be possible to store the stack information in the structure itself.
Threading hides the control structure in the data structure.
Typically,
threading
requires a more complex data structure since  we must  store  both
threads and links. The traversal  algorithms also become more complex since
we  must  recognize the difference between 
control threads and data links. Care must also be taken if we wish to share
threaded list structure. See ⊗↑[Knu#68]↑ for a complete discussion
of the techniques and tricks.

We do not wish   to complicate the LISP structures, but dispensing with a 
stack, be it implicit or explict,   does 
influence storage requirements. We   can  
  strike a compromise; instead of permanently storing the threads
in the structure,
 we can %6temporarily%* store threads as we traverse trees. 
The first application is in the design of a nonrecursive %3read%* program.
The second application we will describe is in the mark phase of a garbage
collector.
.CENT(Problem)
.P162:
.NL
1.##With a little
more testing before stacking we can significantly cut down  the
number of %3push%*es we have to do. Namely, if some of the branches point
immediately to atoms we might as well mark them at that time and 
proceed without doing a stack operation. Only when both branches are "nonatomic"
do we need stack the %3cdr%*. Write such an algorithm.
Further, is it better to stack the %3cdr%1 nodes or the %3cdr%1 nodes?
.SS(A Non-recursive %3read%*)
.P160:
.FP
The original %3read%* algorithm of {yonss(P13)} is a good example of
a clear recursive algorithm;
it is reasonably straightforward to follow the flow of the algorithm.
However, now that we understand what the run-time behavior of such a recursive
program is, we  see that %3read%* is a  drain on %6two%*
resources: it uses
 free-space to construct   the internal representation of the input;
it uses   the run-time stack in the implementation of the
recursion and for saving parameters and intermediate computations. A deeply
nested expression will use a lot of the run-time stack.
Clearly, there is nothing we can do about the drain on the free lists,⊗↓We 
 probably will be drawing on the full word area for print name
storage as well as on the free space area for
list structure storage.←  but threading %6can%* dispense with the run-time 
stack.
We can in fact do so without a proportional increase in the use of 
free space; indeed we need only %6one%* additional free cell, regardless
of the complexity of the input! The algorithm will be much more complex that
the recursive parser, but  that's why this section on storage and efficiency
is where it is. We now understand  the purpose and intent of %3read%*.
Now that the basic algorithm is well understood  we can  be
clever and efficient.

First we describe the basic ideas of the algorithm, then we give the algorithm.
The main idea  in the algorithm is the realization that we  can determine
the storage requirements for a complex S-expr or list structure as we read it in.
For example, consider 
the input string "%3(A#(B#C)#D)%*". As we start our left-to-right scan of the 
input,
we see "%3(%*". This immediately tells us that we need at least one %3cons%*.
We read "%3A%*"; that tells us what the %3car%* of the expression is. Notice that
we don't yet know whether the expression is "dotted" or "listed," but the
storage requirements will be the same. On reading the next open parenthesis we
know we need to add a new level in the  developing representation. The
"%3B%*" and "%3C%*" add elements to that level, and the closing parenthesis finishes
it off. The closing parenthesis also should signal our parser to return to the
prior level and continue scanning the input. The "%3D%*" goes on that
level and the final closing parenthesis completes the input.
To implement this informal idea, we keep a thread in the %3cdr%*-part of 
the last cell on every level. When we go down a level we manufacture a new
cell with the %3cdr%* pointing to the cell we just came from in the previous
level; this happens when we see a left parenthesis. We go up a level when we see
a right parenthesis; that is done by following up the thread in the current level,
after doing appropriate cleanup. 

.group;
There are three basic states in the reader: 
.NL
%21.%*##The next input should go into the %3car%*-part of the current cell.
This state is entered when we go down a level. It is labeled %3head%* in the
following program.
.NL
%22.%*##The next input should go on the current level. This is the typical state
in the building of a list-input. Here we add a new cell in the current level 
and put the input in the %3car%*-part of that cell; then stay in this state.
This state corresponds to label %3tail%*.
.NL
%23.%*##The other main state occurs on reading a dot when in %3tail%*#state.⊗↓Dots 
seen in  any other context are errors.←  In dot#state we check the next input;
if it is an atom we store it on the thread and 
follow the thread.  If the input is  a  left parenthesis
we add  a new cell and go down.
.PT24
.APART
There are some anomalies in the algorithm since it must 
 recognize both S-expr notation
and list notation. To handle both 
kinds of input, we add a  parenthesis counter; it increments for
left parentheses and decrements for right parentheses. A legal input has been
recognized when we are back at the top level and the count is zero.

The final difference between the old parser and the new one involves
the scanner  %3ratom%*. We assume a new %3ratom%* which reads %3()%* and
returns %3NIL%*. If the scanner
sees an open parenthesis, it looks ahead to the next meaningful 
character.⊗↓It ignores 
spaces and the like.←  If the character is a closing parenthesis, 
the scanner  takes it;
if the character is not, it is left for the next call on %3ratom%* and
%3ratom%* returns with an indication that it has seen a left parenthesis.

.BEGIN   TABS 12,21,35,47,49;TURN OFF "←";GROUP;
.TURN ON "\";NOFILL;
.KRK
.BOXA
With this introduction, here is the new %3read%*:
.boxa
.select 3;
read <= λ[[] prog[[j;cp;count;top;temp]
\\count←init[]; cp←count; top←cp;
\head\j←ratom[];
\\[or[is_dot[j];is_rpar[j]] → err[];
.PT2
\\ is_lpar[j] →\incr[count];
\\\cp←down[cp];
\\\go[head];
\\ atom[j] → stuff[cp;j]; go[ckend]];
\tail\j←ratom[];
\\[atom[j] → cp←insert_move[cp;j]; go[ckend];
.PT2
\\ is_rpar[j] →\decr[count];
\\\[eq[top;cp] → go[ck1];
\\\ %et%* →  cp←stuff_up[cp;NIL]; go[ckend]];
\
\\is_lpar[j] →\incr[count];
\\\cp←down[insert_move[cp;NIL]];
\\\go[head];
.PT2
\\is_dot[j] →\j←ratom[];
\\\[or[is_dot[j];is_rpar[j]] → err[];
\\\ is_lpar[j] →\incr[count];
\\\\\cp←insert_move[cp;NIL];
\\\\\go[head];
.PT2
\\\ atom[j] →\cp←stuff_up[cp;j];
\\\\go[ckend]]]; ⊗↓%1This %3go%1 is superfluous, but makes the flow more apparent.←%3
\ckend\[eq[cp;top] → go[ck1];
\\ %et%* → go[tail]];
\ck1\temp← cnt[top];
\end2\[zerop[temp] → return[exp[top]];
\\j←ratom[];
\\[is_rpar[j] → temp←sub1[temp]; go[end2];
\\ %et%* → err[] ]]]
.BOXB

.END
.BEGIN SELECT 3;GROUP;TURN OFF "←";TABIT1(29);
.KRK
.BOXA
init <= λ[[] cons[NIL;0]]
stuff <= λ[[x;y] rplaca[x;y]]
incr <= λ[[z] rplacd[z;add1[cdr[z]]]]

insert_move <= λ[[cp;val] rplacd[cp;cons[val;cdr[cp]]]; cdr[cp]]

down <= λ[[cp] rplaca[cp;cons[NIL;cp]];car[cp]]

stuff_up <= λ[[cp;j] prog[[temp]
\temp ← cdr[cp];
\rplacd[cp;j];
\return[temp]]]

cnt <= λ[[x] cdr[x]]    
.PT2
exp <= λ[[x] car[x]]
.BOXB
.END
.FP
The development and understanding of this algorithm requires most of what we
have covered in the course. 
We use our knowledge of the parser, %3read%*; we use  our familiarity with
 S-exprs stored as linked lists; we have to understand the run-time control
of recursive calling sequences; we have to understand pointer manipulation;
we have to understand pointer modification; and finally we have to be wickedly
clever.
With that understanding we
were  able to apply threading at a level higher than a "once#only" trick.
.CENT(Problem)
.NL
%21.%*##Write a version of %3read%* which uses an explicit stack to remember where it
is in the parse.


.SS(More Applications of Threading)
.Cent(A Link-Bending Garbage Collector for LISP)
.P144:
.FP
The use of a stack is one of the difficulties associated with garbage 
collection. Garbage collection is invoked when available space has become
exhausted, but here we are  asking for %6more%* space to use for stacking.
The usual solution 
to such a problem is to allocate a separate area for stack storage. This 
has its drawbacks. If we don't allocate enough stack space   the
depth of a piece of structure may become too great, and  the marker will fail.
The amount of stack space can become large - proportional to the
depth of a list.
We can apply threading here,
modifying the structure as we traverse it; as usual the threads will be used as
control information. As we finish marking a branch we
restore the structure to its original topology.
Several versions of such threaded collectors are available; see
⊗↑[Chr#68]↑ for a version
written in AMBIT/G; a more traditional description is found in
 ⊗↑[Sch#67]↑;⊗↓The correctness of [Sch#67] has been proved by  de#Roever.← 
 and see ⊗↑[Knu#68]↑ for several alternatives.

.Cent(Binding Implementations)
.FP
Threading can be used in the shallow binder described in {yonss(P228)} to  remember
the path through the environment tree#(⊗↑[Urm#76]↑). 
We thread from E%4bind%1 to E%4inter%1
when we are looking for E%4inter%1. 
This consists of reversing the access links as we proceed toward  E%4inter%1.
Then, as we swap back the value cells, we will unthread from
E%4inter%1 to E%4bind%1.
.SS(Storage Management and LISP,,P224:)
.FP
There are two basic areas of LISP which require attention:
the implementation of data stuctures, and the implementation of a  
LISP machine. We will discuss applications in that order.

LISP's typical data object is a dotted pair; however,
dotted pairs  are frequently used to represent more structured objects.
For example, many common LISP programs involve list#operations on
list representations. But lists, we know, are representations of sequences.
From {yonss(P137)}
we now also know that arrays are efficient representations of sequences.
Indeed array representations are typically more efficient than the
general LISP linked-list.  We would like to capitalize on this more 
efficient representation without jeopardizing the LISP operations.

An analysis of the LISP operations shows that we  %6share%*
substructures, and if using %3rplaca%* or %3rplacd%*, we 
%6modify%* existing structures. Any proposed economies in storage layout
must take cognizance of these facts. Fortunately these requirements are compatible.
.GROUP;
.FP
Consider the typical  representation of the sequence:
.BEGIN CENTER;SELECT 3;
.BOXA
(LAMBDA (X) (F X Y))
.END
.PT2
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 6,6
.BOXA
⊂ααααααααπααα⊃   ⊂αααπααα⊃  ⊂αααπαα⊃  
~ LAMBDA ~ #αβαα→~ # ~ #αβα→~ # ~≤'~
%αααααααα∀ααα$   %αβα∀ααα$  %αβα∀αα$ 
		   ~	      ↓
	⊂αααααααααα$     ⊂αααπααα⊃ ⊂αααπααα⊃ ⊂αααπαα⊃
	↓		 ~ F ~ #αβ→~ X ~ #αβ→~ Y ~≤'~
	⊂αααπαα⊃	 %ααα∀ααα$ %ααα∀ααα$ %ααα∀αα$
	~ X ~≤'~  
	%ααα∀αα$

.BOXB
.END
.APART
.FP
This representation takes seven words. 
The %3car%*-part  of each cell contains the information;
the %3cdr%*-part tells where the rest of the expression is to be found.
That is, we have dedicated 14 half-words to represent the structure; only seven
of which contain the actual information we wish to store. 
Using some extra encoding we can carry the same information in 
seven  slightly larger cells.

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.boxa
.indent 15,15
⊂ααααααααα⊃
~↓ LAMBDA ~
εαααααααααλ     ⊂αααααααα⊃
~↓      #αβαααα→~/     X ~
εαααααααααλ     %αααααααα$
~/      #αβα⊃
%ααααααααα$ ↓   ⊂αααααααα⊃
	    %αα→~↓     F ~
		εααααααααλ
		~↓     X ~
		εααααααααλ
		~/     Y ~
		%αααααααα$

.boxb
.END
.FP
The intent of the special characters  is to encode type information about
the %2next%* cell in the representation. It thus is called %2⊗→%3cdr%*-coding↔←%1.
The %b↓%* means the next cell %2is%* the %3cdr%*; %b/%* means the %3cdr%* is 
%3NIL%*.

The typical LISP cell is a third variety of %3cdr%*-coding: the code %b→%* says
the next cell contains a pointer to the %3cdr%*. With that, we introduce the final code:
%b*%* means this cell is the %3cdr%*-half of a LISP word.
Thus %3(A#B)%* could be expressed in any of the following forms:

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 6,6
.BOXA
⊂ααααα⊃			⊂ααααα⊃
~↓  A ~			~→  A ~
εαααααλ			εαααααλ    ⊂ααααα⊃
~/  B ~			~*  #αβααα→~/  B ~
%ααααα$			%ααααα$    %ααααα$


⊂ααααα⊃			⊂αααααπααααα⊃    ⊂αααααπααααα⊃
~→  A ~			~  A  ~   #αβααα→~  B  ~  ≤' ~
εαααααλ   ⊂ααααα⊃	%ααααα∀ααααα$    %ααααα∀ααααα$
~*  #αβαα→~→  B ~
%ααααα$   εαααααλ
	  ~* NIL~
	  %ααααα$

.BOXB
.END

However this encoding scheme is not sufficient as it stands. Consider the 
following example: Given internal pointers %3x%* and %3z%* into
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 13,13;
.BOXA

x	       z
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ F ~ #αβαα∀αα→~ X ~ #αβαα→~ Y ~≤'~
    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$

.BOXB
.END
.FP
and assume    we wish to perform %3rplacd[x;(A#B#C)]%*.
Using
 our standard implementation, we would have:
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.indent 8,8
.BOXA

x	       z
~	       ~
↓   ⊂αααπααα⊃  ↓   ⊂αααπααα⊃   ⊂αααπαα⊃
%αα→~ F ~ # ~  %αα→~ X ~ #αβαα→~ Y ~≤'~
    %ααα∀αβα$      %ααα∀ααα$   %ααα∀αα$
          ↓
          ~ ⊂αααπααα⊃      ⊂αααπααα⊃   ⊂αααπαα⊃
          %→~ A ~ #αβααααα→~ B ~ #αβαα→~ C ~≤'~
	    %ααα∀ααα$      %ααα∀ααα$   %ααα∀αα$
.BOXB
.END
.FP
However, a problem arises if
 %3(F X Y)%* is represented in its compact form.
We can't replace the cell 

.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP;
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 8,8;
.BOXA

⊂ααααα⊃            	⊂αααααα⊃
~↓  X ~		by 	~*   #αβ→ to ∞3(A B C)∞*
%ααααα$            	%αααααα$
.BOXB
.END
.FP
since the value of %3z%* would change. The solution is an application of the
forwarding address scheme we introduced on {yon(P173)} in the compacting
garbage collector.
We put a forwarding address in the cell referenced by %3x%*;
then allocate a new pair of half-cells, putting %3F%* in the first and a pointer to
%3(A#B#C)%* in the second.
.BEGIN SELECT b;NOFILL;PREFACE 0 MILLS;GROUP
.TURN OFF "%","#","α","←";
.TURN ON "∞" FOR "%";
.INDENT 6,6
.BOXA
   x	⊂ααααααααα⊃     ⊂αααααααα⊃   	⊂αααααααα⊃
   %→αα→~i      #αβαααα→~↓     F ~ ⊂→αα→~↓     A ~
 	εαααααααααλ     εααααααααλ ↑	εααααααααλ
   z →α→~↓      X ~     ~*     #αβα$	~↓     B ~
	εαααααααααλ     %αααααααα$	εααααααααλ
	~/      Y ~  			~/     C ~
	%ααααααααα$ 			%αααααααα$
.BOXB
.END

.FP
These forwarding addresses are an instance of %2⊗→invisible pointers↔←%*
used by R. Greenblatt in his LISP machine; he has  also implemented
hardware %3cdr%*-coding.
Between invisible pointers and 
%3cdr%*-coding, we %6can%* effect all LISP operations using this
potentially more compact representation. 

We  must be able to maintain that compact  representation while the program
is running. This requires more care in the  management of storage.
We cannot simply garbage collect and fragment space; we cannot 
use the simple compacting  garbage collector 
discussed in {yonss(P291)} since it does not
attempt to  maintain the compact representation. 
Several algorithms
with the desired properties exist (⊗↑[Che#70]↑, ⊗↑[Cla#76]↑).
One feature of this data representation is its  use of variable-sized
linked blocks of sequential storage. The management of these storage
blocks is more complex than that of simple dotted#pairs, but the
additional overhead may be  acceptable if it gives better locality of reference
and faster access to  list elements.⊗↓Notice that the %3cdr%1-coded representations
of %3(A#B)%1 and %3(A#.#B)%1  are equally expensive. In the typical linked-list
representation, %3(A#B)%1  requires more space than %3(A#.#B)%1.← 

There is less  conflict about the use of  more complex storage
management techniques in the area of LISP's dynamic implementation.
The original versions of LISP#1.5 used dotted pair structure to represent
the access environments.⊗↓The control information did use a stack implementation
coded in machine language.←  This generality gave    a correct solution to the
implementation of %3function%1, but  experience with LISP implementations
has shown that it is quite expensive to  maintain this generality
when most applications are of a less general nature.
Implementation techniques, patterned after our Weizenbaum
diagrams, allow some economies without loss of generality.
Again, storage would be allocated in sequential blocks; each block would
be of a size sufficient to hold the representation of the name-value entries
along with the additional areas to link the block to the environment.
The storage blocks need not be allocated sequentially; indeed, in the
general case blocks cannot be allocated sequentially. The de-allocation
problems are somewhat different from those experienced 
by data structure representations.
The environment structures are much more "well#behaved" than general list-structures.
Therefore an "environment garbage collector" may not be needed.

The most general techniques for management of LISP's dynamic
environment are based on ⊗↑[Bob#73a]↑ and succeeding papers.⊗↓There is something
contradictory about LISP implementors'  attitudes toward  storage
and  dynamics. Much effort  is  expended in attempting to
minimize the overhead involved in  the dynamic operation of LISP;
it is frequently stated that  users should not be penalized
for access/control constructs which they do  not use. However, that
attitude is not extended to LISP's data structures. There are very
generous subsets of LISP applications in which the data structure operations
are suitably well-behaved,  that storage reclamation techniques
less general than garbage collection are applicable. Analysis of this
area of LISP should lead to  profitable  results.← 

At a lower level of implementation, LISP has much to say about machine
organization. The implementation of efficient environment-swapping
algorithms is a problem which  any operating system must  face.
The traditional solutions impose   severe restrictions on  interprocess
communications. The  algorithms developed for LISP show promise
for giving efficient implementations of more general scope.

LISP's organization of memory also has lessons  for machine architecture. The
management of large variable-sized memory spaces like ⊗↑[Ste#73]↑ or ⊗↑[Wegb#70]↑
can be supported in hardware. The  allocation and de-allocation of such large
spaces also require care; LISP implementors have begun to address these problems
(⊗↑[Ste#76a]↑, ⊗↑[Bis#74a]↑).


.SS(Hash Techniques,,P248:)
.FP
One phenomenon of LISP  is the 
sheer size of data structures which a large LISP program generates.
Many LISP projects  approach 10%87%1 bits of program and data.
Several techniques have been  developed to help shrink  data representation;
%3cdr%1-coding ({yonss(P224)}) is one technique. Another technique stems
from the  observation that LISP tends to %6copy%1 structures rather
than %6share%1 them. We know that  the sharing of structures  must be done
with great care if modification operations like %3rplaca%1 and %3rplacd%1
are present, but  sharing of structure can mean a significant saving in space.
In fact, the saving can also improve the algorithms which manipulate
the structures. For example  if every list#structure is stored uniquely,
then the time for the equality test %3equal%1 is a constant rather than being
proportional to the  depth of the structure.

We present two techniques  for maintaining unique structure:
either maintain list space such that unique representations are always 
present  or supply an algorithm which will "uniquize" structures upon
request. The first alternative is usually called %2⊗→hash consing↔←%1; the
second technique is called %2list condensation%1#(⊗↑[Lin#73]↑).
A condensation algorithm must remove all duplicated structure from within
a list. Since condensation is a component of many  hashed LISP implementations,
we will concentrate our attention on hash consing.

Hash consing is an extension of the  LISP technique for generating
unique atoms.  Since
list structure is created only by the %3cons%1 operation,⊗↓However, list structure
may be modified by %3rplaca%1 and %3rplacd%1.←
we place the responsibility for maintaining  unique structure within %3cons%1.
If the result of a pending %3cons%1 is already present, we
return a pointer to that structure, otherwise we perform the %3cons%1
and record the result  so that it will be retrieved if the same
%3cons%1 happens again. The adjective "hash" is applied to this version of
%3cons%1 since the typical implementation uses a hashing algorithm to
maintain the uniqueness.
Hash %3cons%1ing imposes restrictions on the
programmer's use of list modification operations. If unique copies
are available, severe difficulties result if modifications are made.
One  either may   disallow list modification  or may supply additional operations
to copy structure, modify it, and "uniquize" the result, or  an implementation may
supply different kinds of structures, some modifiable  and some not modifiable.

A hash %3cons%1 was proposed for LISP 1.75 on the IBM#M44, but the
implementation was never
completed. A limited version of hash %3cons%1ing was implemented as
an extension of LISP 1.6 at Stanford.

Impressive and extensive applications of hashing appear
in HLISP (⊗↑[Got#74]↑, ⊗↑[Ter#75]↑). That implementation of LISP
supplies two different kinds of structures: ordinary list structure
and "monocopy" structures. Operations are also supplied for
conversion between types. Extensive analysis of hashing effects
on algorithm performance has also been done (⊗↑[Got#76]↑).
HLISP also employs hashing in its implementation of property lists.
Property lists are not stored explicitly, but rather the atom name and the 
property name are used to form a hash index; the property value is associated
with that hash index.
For example,
%3get[x;i]%1 hashes  with both  %3x%1 and %3i%1 to  retrieve  the property value.

The other major implementations of LISP also
offer specialized operations for dealing with hashed  quantities;
see ⊗↑[Moo#74]↑, ⊗↑[Int#75]↑, and ⊗↑[Bob#75]↑.